Methods for Transcriptome Assembly in the Allopolyploid Brassica napus

ثبت نشده
چکیده

Canada is the world’s largest producer of canola and the trend of production is ever increasing with an annual growth rate of 9.38% according to FAOSTAT [1]. In 2017, canola acreage surpassed wheat in Saskatchewan, the highest producer of both crops in Canada. Country-wide, the total farming area of canola increased by 9.9% to 22.4 million acres while wheat area saw a slight decline to 23.3 million acres [2]. While Canada is the highest producer of the crop, yields are lower than other countries [1]. To maximize the benefit of this market, canola cultivation could be made more efficient with further characterization of the organism’s genes and their involvement in plant robustness. Such studies using transcriptome analysis have been successful in organisms with relatively small and simple genomes. However, such analyses in B. napus are complicated by the allopolyploid genome structure resulting from ancestral whole genome duplications in the species’ evolutionary history. Homeologous gene pairs originating from the orthology between the two B. napus progenitor species complicate the process of transcriptome assembly. Modern assemblers: Trinity [3], Oases [4] and SOAPdenovo-Trans [5] were used to generate several de novo transcriptome assemblies for B. napus. A variety of metrics were used to determine the impact that the complex genome structure has on transcriptome studies. In particular, the most important questions for transcriptome assembly in B. napus were how does varying the k-mer parameter effect assembly quality, and to what extent do similar genes resulting from homeology within B. napus complicate the process of assembly. These metrics used for evaluating the assemblies include basic assembly statistics such as the number of contigs and contig lengths (via N25, N50 and N75 statistics); as well as more involved investigation via comparison to annotated coding DNA sequences; evaluation softwares scores for de novo transcriptome assemblies and finally; quantification of homeolog differentiation by alignment to previously identified pairs of homeologous genes. These metrics provided a picture of the trade-offs between assembly softwares and the k-parameter determining the length of subsequences used to build de Bruijn graphs for de novo transcriptome assembly. It was shown that shorter k-mer lengths produce fewer, and more complete contigs due to the shorter required overlap between read sequences; while longer k-mer lengths increase the sensitivity of an assembler to sequence variation between similar gene sequences. The Trinity assembler outperformed Oases and SOAPdenovo-Trans when considering the total breadth of evaluation metrics, generating longer transcripts with fewer chimers between homeologous gene pairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Karyotype and identification of all homoeologous chromosomes of allopolyploid Brassica napus and its diploid progenitors.

Investigating recombination of homoeologous chromosomes in allopolyploid species is central to understanding plant breeding and evolution. However, examining chromosome pairing in the allotetraploid Brassica napus has been hampered by the lack of chromosome-specific molecular probes. In this study, we establish the identification of all homoeologous chromosomes of allopolyploid B. napus by usin...

متن کامل

Capturing sequence variation among flowering-time regulatory gene homologs in the allopolyploid crop species Brassica napus

Flowering, the transition from the vegetative to the generative phase, is a decisive time point in the lifecycle of a plant. Flowering is controlled by a complex network of transcription factors, photoreceptors, enzymes and miRNAs. In recent years, several studies gave rise to the hypothesis that this network is also strongly involved in the regulation of other important lifecycle processes ran...

متن کامل

Construction of Brassica A and C genome-based ordered pan-transcriptomes for use in rapeseed genomic research

This data article reports the establishment of the first pan-transcriptome resources for the Brassica A and C genomes. These were developed using existing coding DNA sequence (CDS) gene models from the now-published Brassica oleracea TO1000 and Brassica napus Darmor-bzh genome sequence assemblies representing the chromosomes of these species, along with preliminary CDS models from an updated Br...

متن کامل

Genetic and Epigenetic Changes in Oilseed Rape (Brassica napus L.) Extracted from Intergeneric Allopolyploid and Additions with Orychophragmus

Allopolyploidization with the merger of the genomes from different species has been shown to be associated with genetic and epigenetic changes. But the maintenance of such alterations related to one parental species after the genome is extracted from the allopolyploid remains to be detected. In this study, the genome of Brassica napus L. (2n = 38, genomes AACC) was extracted from its intergener...

متن کامل

Genome-wide gene expression perturbation induced by loss of C2 chromosome in allotetraploid Brassica napus L.

Aneuploidy with loss of entire chromosomes from normal complement disrupts the balanced genome and is tolerable only by polyploidy plants. In this study, the monosomic and nullisomic plants losing one or two copies of C2 chromosome from allotetraploid Brassica napus L. (2n = 38, AACC) were produced and compared for their phenotype and transcriptome. The monosomics gave a plant phenotype very si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017